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Abstract 

As part of a project to develop more accurate estimates of the risks due to tropical cyclones, 
we describe a non-parametric method for the statistical simulation of the location of tropical cyclone 
genesis. The method avoids the use of arbitrary grid boxes, and the spatial smoothing of the historical 
data is constructed optimally according to a clearly denned merit function. 

1 Introduction 

We are interested in developing more accurate methods for the estimation of the various risks associated 
with tropical cyclones, such as the risks of extreme winds, extreme rainfall and extreme storm surge. The 
methods for tropical cyclone risk assessment described in the scientific and engineering literature can be 
categorised into local methods and basin-wide methods. Local methods estimate the risk of high winds, 
rain or surge at a coastal location using information from historical tropical cyclone events that made 
landfall near that location. Basin-wide methods estimate risks using a model for the entire life-cycle 
of tropical cyclones, from genesis to lysis. There are also methods in between, t hat model a part o f 
the life-cycle of tropical cy clones. Some examples of local methods are described in|jaae;cr ct al. (200f) 
and IMurnane et all ll2f)0f ), and some exampl e s of basin- wide methods are described in iDravtonj ( 20001. 



iFuiii and Mitsutal ^1975fl . ivickerv et~aT]l|2000|) . lEmanuel et all l|2005|) . lDarlind <|l99l|) and lChu and Wand 



(1998). The basin-wide methods, if they can be made to work well, are potentially the most accurate 
and the most useful. They are potentially the most accurate because they make use of all the available 
historical data, and they are potentially the most useful because they can give estimates of any of the 
various risks associated with tropical cyclones. For instance a basin- wide model could be used to estimate 
the risk of a tropical cyclone occurring at any point in the basin, not only over land: this could be useful 
for shipping and the offshore oil industry. And it could also be used to estimate the risk of a tropical 
cyclone having an impact on more than two locations during its lifetime, such as the risk of a hurricane 
hitting both Puerto Rico and Florida: this could be useful for insurance companies that may sell insurance 
in both places. Basin- wide models also have another advantage, which is that they can accommodate the 
inclusion of weather, seasonal and year-ahead forecasts more easily than local models. 

Given these potential advantages of basin-wide tropical cyclone risk models, and because of various 
shortcomings in the basin-wide methods described in the literature, we have initiated a project to build 
a new basin-wide tropical cyclone model from scratch. One of the features of the model is that we 
are paying great attention to the use of carefully designed statistical procedures and methodologies. In 
particular: 

• By using non-parametric statistical methods we avoid the use of arbitrary grid boxes within the 
basin. The modelled properties of tropical cyclones are allowed to vary smoothly in space (and 
time), as they presumably do in reality. 

• We use a merit function (the likelihood) that allows us to perform an objective comparison among 
different models 

• All lengthscales and timescales used to select the data used in the model are derived optimally 
according to the merit function 
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• We evaluate our merit function in out-of-sample tests, to avoid overfitting and to account correctly 
for parameter uncertainty. 

• We start with simple methods, and build up to more complex methods, again to avoid overfitting. 

Following the various methods described in the literature (and cited above), we divide the problem 
of basin-wide tropical cyclone modelling into various steps: 

1. Modelling annual rates 

2. Modelling the distribution of genesis in space and time 

3. Modelling tracks 

4. Modelling intensity along tracks, including lysis 

5. Modelling wind fields 

6. Modelling storm surge 

7. Modelling rainfall 

So far we have only c o nsidered step 3, t he mo dellin g of the shape o f tropic al cyclone tracks 
Csee lHall and Jewsonl ()2005alh lHall and Jewsonl l|2005b|) and lHall and Jewsonl l|2005c(l '). We have now 
turned our attention to step 2, the modelling of genes is, and that is the subject of this article. A first 



As in our track modelling we are focussing initially on the Atlantic ba sin, and the data we use is the 
'officar National Hurricane Centre track data set, known as HURDAT Jjarvinen et al.l fl9841 . Wc only 
use data from 1950, since this is the only data that we consider to be sufficiently reliable. Reliability 
increases from 1950 onwards because starting from 1950 doppler radar was routinely used to determine 
wind speed. HURDAT data from 1950 to 2004 contains 524 tropical cyclones, and these are the data 
that we will use. Each storm has a unique data genesis point, and these 524 genesis locations are the 
input for our statistical model for genesis. 



As discussed in the introduction, our aim is to build models that possess a number of desirable features. 
The genesis model that we now describe, that possesses these features, is a two-dimensional kernel density 
with the bandwidth fitted using cross-validation. The two dimensions are longitude and latitude: at this 
point we ignore variations in genesis by season or by year, although we intend to consider these in a later 
study. The model gives a probability density f(x, y) for tropical cyclone genesis at the point {x, y) of: 



where the x\ and yi are the longitudes and latitudes of the historical genesis points, a x and a v are 
bandwidths in the longitudinal and latitudinal directions, and if is a kernel function. Large values for 
the bandwidths create a very smoothed density and small values create a very multi-modal density. 
The kernel K must satisfy 
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and the 4? term ensures that 
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as it must for / to be a probability density. 

For convenience and simplicity we use a Gaussian kernel with a x = cr y , and so: 
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which gives: 
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The optimal bandwidth a is determined using a jack-knife cross-validation procedure as follows: 

• We loop over a range of values for the bandwidth 

• We loop over the 55 years of data 

• For each value of the bandwidth, for each year of data, and for each genesis point that occurs within 
that year, we calculate the density at that point using expression ^ but eliminating all the data 
points in the same year from the sum 

• For each value of the bandwidth, we calculate the likelihood score as the product of the densities 
at all genesis points 

• We find the value of the bandwidth that gives the highest likelihood score 



We now show the results from the fitting of the kernel density to the observed genesis points. The variation 
of the likelihood score with the bandwidth is shown in figure ^ There is a very clear maximum of the 
likelihood function at a bandwidth of 210km. Figure |2 shows the historical hurricane genesis locations, 
and estimated densities based on the kernel model. Panel (a) shows the historical genesis locations, panel 
(b) shows a density derived using a bandwidth of 100km, panel (c) shows a density derived using the 
optimal bandwidth of 210km, and panel (d) shows a density derived using a bandwidth of 500km. The 
effects of undersmoothing and oversmoothing can be seen very clearly in panels (b) and (d). 

4.1 Simulations 

Having fitted a probability density function to the observed hurricane genesis points, we can now simulate 
as many hurricane genesis points as we desire. The simulation method we use works as follows: 

• We normalise the density f{x,y) to have a maximum value of 1. 

• We simulate random values of (x, y) from a region that covers the entire domain, x and y are 
simulated from independent uniform distributions. 

• We then cither accept or reject each simulated value of (x, y) randomly, with a probability given 
by the normalised density. 

Figure |3 shows three realisations from such simulations, each of 524 points (in panels (b), (c) and 
(d)) along with the 524 historical genesis points (in panel (a)). We can see that the simulations follow 
the pattern of historical genesis reasonably closely, but are different in detail, as we would expect. 

One of the shortcomings of the model described above, apparent in figure |3 is that there are a 
number of genesis points that have been simulated over land. This is non-physical in most cases, although 
there are occasional genesis points over Florida and the Yucatan in the observations. Non-physical genesis 
points in the simulations can be rejected to solve this problem. 

5 Conclusions 

We have described a statistical model for the location of hurricane genesis. Our model is a non-parametric 
kernel density, with the bandwidth fitted using a cross-validation procedure that optimises the out-of- 
sample likelihood. The advantages of this approach include: 

• not having to define grid boxes 

• the use of a clear merit function 
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• making the best use of the historical data by avoiding over- or under-fitting 



This model is intended as the first model in a hierarchy, and as such it can probably be beaten by 
a more complex model. Its value is that it sets an initial standard. Because of the use of a well-defined 
merit function it will be easy to check whether the model has been beaten or not. 

There are a number of ways that one might try to improve the performance of this model, such as: 

• The introduction of seasonality. Figure 01 shows the observed genesis points by month. There are 
distinctively different patterns in each month. 

• The introduction of different smoothing in the longitude and latitude directions. 

• The use of kernels other than the gaussian kernel (although we have no particular reason to think 
that this will give an improvement, it may). 



In lEmanuel et al.l <|2005(1 there is some discussion of issues related to whether the historical genesis 
data is accurate prior to the introduction of satellite observations in the 1970s. If this is considered an 
important issue then it would be easy to refit the current model but using only the more recent data. 
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Figure 1: The out-of-sample log- likelihood versus bandwidth for the kernel density model described in 
the text. The maximum log-likelihood is at a bandwidth of 210km. 




Figure 2: Panel (a) shows the observed tropical cyclone genesis locations for the period 1950 to 2004 
(524 points), and panels (b), (c) and (d) show kernel densities estimated using bandwidths of 100km, 
210km and 500km respectively. 




Figure 3: Panel (a) shows the observed genesis locations for tropical cyclones in the period 1950 to 2004. 
There are 524 points. Panels (b), (c) and (d) each show 524 simulated genesis locations from the model 
described in the text. 
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Figure 4: The observed tropical cyclones genesis locations for the period 1950 to 2004, by month. There 
is a clear variation of genesis location by month, suggesting that the model described in the text could 
possibly be improved by including time as a third dimension in the kernel density. 



